30 research outputs found
MixUp as Locally Linear Out-Of-Manifold Regularization
MixUp is a recently proposed data-augmentation scheme, which linearly
interpolates a random pair of training examples and correspondingly the one-hot
representations of their labels. Training deep neural networks with such
additional data is shown capable of significantly improving the predictive
accuracy of the current art. The power of MixUp, however, is primarily
established empirically and its working and effectiveness have not been
explained in any depth. In this paper, we develop an understanding for MixUp as
a form of "out-of-manifold regularization", which imposes certain "local
linearity" constraints on the model's input space beyond the data manifold.
This analysis enables us to identify a limitation of MixUp, which we call
"manifold intrusion". In a nutshell, manifold intrusion in MixUp is a form of
under-fitting resulting from conflicts between the synthetic labels of the
mixed-up examples and the labels of original training data. Such a phenomenon
usually happens when the parameters controlling the generation of mixing
policies are not sufficiently fine-tuned on the training data. To address this
issue, we propose a novel adaptive version of MixUp, where the mixing policies
are automatically learned from the data using an additional network and
objective function designed to avoid manifold intrusion. The proposed
regularizer, AdaMixUp, is empirically evaluated on several benchmark datasets.
Extensive experiments demonstrate that AdaMixUp improves upon MixUp when
applied to the current art of deep classification models.Comment: Accepted by AAAI201
Code-Style In-Context Learning for Knowledge-Based Question Answering
Current methods for Knowledge-Based Question Answering (KBQA) usually rely on
complex training techniques and model frameworks, leading to many limitations
in practical applications. Recently, the emergence of In-Context Learning (ICL)
capabilities in Large Language Models (LLMs) provides a simple and
training-free semantic parsing paradigm for KBQA: Given a small number of
questions and their labeled logical forms as demo examples, LLMs can understand
the task intent and generate the logic form for a new question. However,
current powerful LLMs have little exposure to logic forms during pre-training,
resulting in a high format error rate. To solve this problem, we propose a
code-style in-context learning method for KBQA, which converts the generation
process of unfamiliar logical form into the more familiar code generation
process for LLMs. Experimental results on three mainstream datasets show that
our method dramatically mitigated the formatting error problem in generating
logic forms while realizing a new SOTA on WebQSP, GrailQA, and GraphQ under the
few-shot setting.Comment: work in progres
Narrowing the Gap between Supervised and Unsupervised Sentence Representation Learning with Large Language Model
Sentence Representation Learning (SRL) is a fundamental task in Natural
Language Processing (NLP), with Contrastive learning of Sentence Embeddings
(CSE) as the mainstream technique due to its superior performance. An
intriguing phenomenon in CSE is the significant performance gap between
supervised and unsupervised methods, even when their sentence encoder and loss
function are the same. Previous works attribute this performance gap to
differences in two representation properties (alignment and uniformity).
However, alignment and uniformity only measure the results, which means they
cannot answer "What happens during the training process that leads to the
performance gap?" and "How can the performance gap be narrowed?". In this
paper, we conduct empirical experiments to answer these "What" and "How"
questions. We first answer the "What" question by thoroughly comparing the
behavior of supervised and unsupervised CSE during their respective training
processes. From the comparison, We observe a significant difference in fitting
difficulty. Thus, we introduce a metric, called Fitting Difficulty Increment
(FDI), to measure the fitting difficulty gap between the evaluation dataset and
the held-out training dataset, and use the metric to answer the "What"
question. Then, based on the insights gained from the "What" question, we
tackle the "How" question by increasing the fitting difficulty of the training
dataset. We achieve this by leveraging the In-Context Learning (ICL) capability
of the Large Language Model (LLM) to generate data that simulates complex
patterns. By utilizing the hierarchical patterns in the LLM-generated data, we
effectively narrow the gap between supervised and unsupervised CSE.Comment: work in progres